Skip to content

api-query: Continue device initialization when a backend fails#1032

Open
MarijnS95 wants to merge 1 commit intollvm:mainfrom
Traverse-Research:api-query-continue-on-backend-init-failure
Open

api-query: Continue device initialization when a backend fails#1032
MarijnS95 wants to merge 1 commit intollvm:mainfrom
Traverse-Research:api-query-continue-on-backend-init-failure

Conversation

@MarijnS95
Copy link
Copy Markdown
Contributor

@MarijnS95 MarijnS95 commented Mar 30, 2026

Device::initialize() returns early when any backend fails to initialize, preventing subsequent backends from being discovered. For example, a Vulkan initialization failure (e.g. VK_ERROR_INCOMPATIBLE_DRIVER when MoltenVK is not installed) blocks Metal device discovery on macOS.

Collect backend initialization errors with joinErrors() and return them together instead. Also make lit.cfg.py resilient to an empty device list from api-query.

Test plan

  • Verified api-query outputs Metal device when Vulkan init fails (no MoltenVK)
  • Verified check-hlsl runs Metal tests successfully with this change

🤖 Generated with Claude Code

@MarijnS95 MarijnS95 force-pushed the api-query-continue-on-backend-init-failure branch from f4cbd80 to de099b0 Compare March 30, 2026 13:14
@MarijnS95 MarijnS95 force-pushed the api-query-continue-on-backend-init-failure branch from de099b0 to d9b6e6a Compare March 30, 2026 16:35
@MarijnS95 MarijnS95 requested a review from llvm-beanz March 31, 2026 07:51
@manon-traverse
Copy link
Copy Markdown
Contributor

LGTM

@MarijnS95 MarijnS95 force-pushed the api-query-continue-on-backend-init-failure branch from d9b6e6a to 0fe8047 Compare April 2, 2026 16:17
@bogner
Copy link
Copy Markdown
Contributor

bogner commented Apr 6, 2026

A few things:

  1. This has conflicts due to Restructure how devices are created and automatically cleaned up. #1035 and needs to be updated
  2. Wouldn't we also need a change in api-query.cpp to not abort after calling initializeDevices? This might be something that changed since Restructure how devices are created and automatically cleaned up. #1035
  3. The commit message here could be significantly simplified, which would help make it clear what this does. Commit messages shouldn't repeat every change that a commit will make, but instead focus on pointing out why the change is being made. Here, it would suffice to point out the problem (we quit after each device) and the high level of how we're fixing it (collect the errors, report them, and continue on).

Device::initialize() returns early when any backend fails to initialize,
preventing subsequent backends from being discovered. For example, a
Vulkan initialization failure (e.g. VK_ERROR_INCOMPATIBLE_DRIVER when
MoltenVK is not installed) blocks Metal device discovery on macOS.

Collect backend initialization errors with joinErrors() and return them
together instead. Also make lit.cfg.py resilient to an empty device list
from api-query.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@MarijnS95 MarijnS95 force-pushed the api-query-continue-on-backend-init-failure branch from 0fe8047 to 1d5330c Compare April 7, 2026 06:35
@MarijnS95
Copy link
Copy Markdown
Contributor Author

  1. The commit message here could be significantly simplified, which would help make it clear what this does. Commit messages shouldn't repeat every change that a commit will make, but instead focus on pointing out why the change is being made. Here, it would suffice to point out the problem (we quit after each device) and the high level of how we're fixing it (collect the errors, report them, and continue on).

@bogner just to make sure we're talking about the same commit message:

Device::initialize() returns early when any backend fails to initialize, preventing subsequent backends from being discovered. For example, a Vulkan initialization failure (e.g. VK_ERROR_INCOMPATIBLE_DRIVER when MoltenVK is not installed) blocks Metal device discovery on macOS.

Collect backend initialization errors with joinErrors() and return them together instead. Also make lit.cfg.py resilient to an empty device list from api-query.

This explains why something is wrong together with an example case of when that has an effect (Vulkan init on MacOS), followed by how it was fixed (using joinError(), and the drive-by note aboutlit.cfg.py not allowing empty yaml lists initially).

Is that still too much?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants